AITopics | covariate distribution

Collaborating Authors

covariate distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Maximum Risk Minimization with Random Forests

Freni, Francesco, Fries, Anya, Kühne, Linus, Reichstein, Markus, Peters, Jonas

arXiv.org Machine LearningDec-12-2025

We consider a regression setting where observations are collected in different environments modeled by different data distributions. The field of out-of-distribution (OOD) generalization aims to design methods that generalize better to test environments whose distributions differ from those observed during training. One line of such works has proposed to minimize the maximum risk across environments, a principle that we refer to as MaxRM (Maximum Risk Minimization). In this work, we introduce variants of random forests based on the principle of MaxRM. We provide computationally efficient algorithms and prove statistical consistency for our primary method. Our proposed method can be used with each of the following three risks: the mean squared error, the negative reward (which relates to the explained variance), and the regret (which quantifies the excess risk relative to the best predictor). For MaxRM with regret as the risk, we prove a novel out-of-sample guarantee over unseen test distributions. Finally, we evaluate the proposed methods on both simulated and real-world data.

maximum mse, maxrm-rf, mse, (14 more...)

arXiv.org Machine Learning

2512.10445

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Switzerland (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.73)

Add feedback

DFW: A Novel Weighting Scheme for Covariate Balancing and Treatment Effect Estimation

Khan, Ahmad Saeed, Schaffernicht, Erik, Stork, Johannes Andreas

arXiv.org Artificial IntelligenceAug-8-2025

Estimating causal effects from observational data is challenging due to selection bias, which leads to imbalanced covariate distributions across treatment groups. Propensity score-based weighting methods are widely used to address this issue by reweighting samples to simulate a randomized controlled trial (RCT). However, the effectiveness of these methods heavily depends on the observed data and the accuracy of the propensity score estimator. For example, inverse propensity weighting (IPW) assigns weights based on the inverse of the propensity score, which can lead to instable weights when propensity scores have high variance-either due to data or model misspecification-ultimately degrading the ability of handling selection bias and treatment effect estimation. To overcome these limitations, we propose Deconfounding Factor Weighting (DFW), a novel propensity score-based approach that leverages the deconfounding factor-to construct stable and effective sample weights. DFW prioritizes less confounded samples while mitigating the influence of highly confounded ones, producing a pseudopopulation that better approximates a RCT. Our approach ensures bounded weights, lower variance, and improved covariate balance.While DFW is formulated for binary treatments, it naturally extends to multi-treatment settings, as the deconfounding factor is computed based on the estimated probability of the treatment actually received by each sample. Through extensive experiments on real-world benchmark and synthetic datasets, we demonstrate that DFW outperforms existing methods, including IPW and CBPS, in both covariate balancing and treatment effect estimation.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.05215

Country: Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Air (1.00)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Structure Maintained Representation Learning Neural Network for Causal Inference

Sun, Yang, Lu, Wenbin, Zhou, Yi-Hui

arXiv.org Machine LearningAug-5-2025

Recent developments in causal inference have greatly shifted the interest from estimating the average treatment effect to the individual treatment effect. In this article, we improve the predictive accuracy of representation learning and adversarial networks in estimating individual treatment effects by introducing a structure keeper which maintains the correlation between the baseline covariates and their corresponding representations in the high dimensional space. We train a discriminator at the end of representation layers to trade off representation balance and information loss. We show that the proposed discriminator minimizes an upper bound of the treatment estimation error. We can address the tradeoff between distribution balance and information loss by considering the correlations between the learned representation space and the original covariate feature space. We conduct extensive experiments with simulated and real-world observational data to show that our proposed Structure Maintained Representation Learning (SMRL) algorithm outperforms state-of-the-art methods. We also demonstrate the algorithms on real electronic health record data from the MIMIC-III database.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Machine Learning

2508.01865

Country:

North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Clustered random forests with correlated data for optimal estimation and inference under potential covariate shift

Young, Elliot H., Bühlmann, Peter

arXiv.org Machine LearningMar-16-2025

We develop Clustered Random Forests, a random forests algorithm for clustered data, arising from independent groups that exhibit within-cluster dependence. The leaf-wise predictions for each decision tree making up clustered random forests takes the form of a weighted least squares estimator, which leverage correlations between observations for improved prediction accuracy. Clustered random forests are shown for certain tree splitting criteria to be minimax rate optimal for pointwise conditional mean estimation, while being computationally competitive with standard random forests. Further, we observe that the optimality of a clustered random forest, with regards to how (population level) optimal weights are chosen within this framework i.e. those that minimise mean squared prediction error, vary under covariate distribution shift. In light of this, we advocate weight estimation to be determined by a user-chosen covariate distribution with respect to which optimal prediction or inference is desired. This highlights a key difference in behaviour, between correlated and independent data, with regards to nonparametric conditional mean estimation under covariate shift. We demonstrate our theoretical findings numerically in a number of simulated and real-world settings.

artificial intelligence, machine learning, random forest, (18 more...)

arXiv.org Machine Learning

2503.12634

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples

Zhang, Thomas T., Lee, Bruce D., Ziemann, Ingvar, Pappas, George J., Matni, Nikolai

arXiv.org Machine LearningOct-14-2024

A driving force behind the diverse applicability of modern machine learning is the ability to extract meaningful features across many sources. However, many practical domains involve data that are non-identically distributed across sources, and statistically dependent within its source, violating vital assumptions in existing theoretical studies. Toward addressing these issues, we establish statistical guarantees for learning general $\textit{nonlinear}$ representations from multiple data sources that admit different input distributions and possibly dependent data. Specifically, we study the sample-complexity of learning $T+1$ functions $f_\star^{(t)} \circ g_\star$ from a function class $\mathcal F \times \mathcal G$, where $f_\star^{(t)}$ are task specific linear functions and $g_\star$ is a shared nonlinear representation. A representation $\hat g$ is estimated using $N$ samples from each of $T$ source tasks, and a fine-tuning function $\hat f^{(0)}$ is fit using $N'$ samples from a target task passed through $\hat g$. We show that when $N \gtrsim C_{\mathrm{dep}} (\mathrm{dim}(\mathcal F) + \mathrm{C}(\mathcal G)/T)$, the excess risk of $\hat f^{(0)} \circ \hat g$ on the target task decays as $\nu_{\mathrm{div}} \big(\frac{\mathrm{dim}(\mathcal F)}{N'} + \frac{\mathrm{C}(\mathcal G)}{N T} \big)$, where $C_{\mathrm{dep}}$ denotes the effect of data dependency, $\nu_{\mathrm{div}}$ denotes an (estimatable) measure of $\textit{task-diversity}$ between the source and target tasks, and $\mathrm C(\mathcal G)$ denotes the complexity of the representation class $\mathcal G$. In particular, our analysis reveals: as the number of tasks $T$ increases, both the sample requirement and risk bound converge to that of $r$-dimensional regression as if $g_\star$ had been given, and the effect of dependency only enters the sample requirement, leaving the risk bound matching the iid setting.

learning, proposition 2, representation, (14 more...)

arXiv.org Machine Learning

2410.11227

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

An Empirical Examination of Balancing Strategy for Counterfactual Estimation on Time Series

Huang, Qiang, Meng, Chuizheng, Cao, Defu, Huang, Biwei, Chang, Yi, Liu, Yan

arXiv.org Artificial IntelligenceAug-16-2024

Counterfactual estimation from observations represents a critical endeavor in numerous application fields, such as healthcare and finance, with the primary challenge being the mitigation of treatment bias. The balancing strategy aimed at reducing covariate disparities between different treatment groups serves as a universal solution. However, when it comes to the time series data, the effectiveness of balancing strategies remains an open question, with a thorough analysis of the robustness and applicability of balancing strategies still lacking. This paper revisits counterfactual estimation in the temporal setting and provides a brief overview of recent advancements in balancing strategies. More importantly, we conduct a critical empirical examination for the effectiveness of the balancing strategies within the realm of temporal counterfactual estimation in various settings on multiple datasets. Our findings could be of significant interest to researchers and practitioners and call for a reexamination of the balancing strategy in time series settings.

estimation, representation, treatment bias, (14 more...)

arXiv.org Artificial Intelligence

2408.08815

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Jilin Province (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.87)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

Revisiting Counterfactual Regression through the Lens of Gromov-Wasserstein Information Bottleneck

Yang, Hao, Sun, Zexu, Xu, Hongteng, Chen, Xu

arXiv.org Machine LearningMay-24-2024

As a promising individualized treatment effect (ITE) estimation method, counterfactual regression (CFR) maps individuals' covariates to a latent space and predicts their counterfactual outcomes. However, the selection bias between control and treatment groups often imbalances the two groups' latent distributions and negatively impacts this method's performance. In this study, we revisit counterfactual regression through the lens of information bottleneck and propose a novel learning paradigm called Gromov-Wasserstein information bottleneck (GWIB). In this paradigm, we learn CFR by maximizing the mutual information between covariates' latent representations and outcomes while penalizing the kernelized mutual information between the latent representations and the covariates. We demonstrate that the upper bound of the penalty term can be implemented as a new regularizer consisting of $i)$ the fused Gromov-Wasserstein distance between the latent representations of different groups and $ii)$ the gap between the transport cost generated by the model and the cross-group Gromov-Wasserstein distance between the latent representations and the covariates. GWIB effectively learns the CFR model through alternating optimization, suppressing selection bias while avoiding trivial latent distributions. Experiments on ITE estimation tasks show that GWIB consistently outperforms state-of-the-art CFR methods. To promote the research community, we release our project at https://github.com/peteryang1031/Causal-GWIB.

covariate, information, latent representation, (16 more...)

arXiv.org Machine Learning

2405.15505

Country:

North America > United States (0.14)
Asia > China (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Data Science (0.92)

Add feedback

Informativeness of Weighted Conformal Prediction

Ying, Mufang, Guo, Wenge, Khamaru, Koulik, Hung, Ying

arXiv.org Machine LearningMay-10-2024

Weighted conformal prediction (WCP), a recently proposed framework, provides uncertainty quantification with the flexibility to accommodate different covariate distributions between training and test data. However, it is pointed out in this paper that the effectiveness of WCP heavily relies on the overlap between covariate distributions; insufficient overlap can lead to uninformative prediction intervals. To enhance the informativeness of WCP, we propose two methods for scenarios involving multiple sources with varied covariate distributions. We establish theoretical guarantees for our proposed methods and demonstrate their efficacy through simulations.

covariate distribution, prediction interval, probability, (13 more...)

arXiv.org Machine Learning

2405.06479

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > New Jersey (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning sources of variability from high-dimensional observational studies

Bridgeford, Eric W., Chung, Jaewon, Gilbert, Brian, Panda, Sambit, Li, Adam, Shen, Cencheng, Badea, Alexandra, Caffo, Brian, Vogelstein, Joshua T.

arXiv.org Machine LearningNov-28-2023

Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. We propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. Numerical experiments illustrate that our method, Causal CDcorr, leads to improvements in both finite sample validity and power when compared to existing strategies. Our methods are all open source and available at github.com/ebridge2/cdcorr.

artificial intelligence, discrepancy, machine learning, (17 more...)

arXiv.org Machine Learning

2307.13868

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas > Tarrant County > Arlington (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Trained Transformers Learn Linear Models In-Context

Zhang, Ruiqi, Frei, Spencer, Bartlett, Peter L.

arXiv.org Machine LearningOct-19-2023

Attention-based neural networks such as transformers have demonstrated a remarkable ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an unseen task, they can formulate relevant per-token and next-token predictions without any parameter updates. By embedding a sequence of labeled training data and unlabeled test data as a prompt, this allows for transformers to behave like supervised learning algorithms. Indeed, recent work has shown that when training transformer architectures over random instances of linear regression problems, these models' predictions mimic those of ordinary least squares. Towards understanding the mechanisms underlying this phenomenon, we investigate the dynamics of ICL in transformers with a single linear self-attention layer trained by gradient flow on linear regression tasks. We show that despite non-convexity, gradient flow with a suitable random initialization finds a global minimum of the objective function. At this global minimum, when given a test prompt of labeled examples from a new prediction task, the transformer achieves prediction error competitive with the best linear predictor over the test prompt distribution. We additionally characterize the robustness of the trained transformer to a variety of distribution shifts and show that although a number of shifts are tolerated, shifts in the covariate distribution of the prompts are not. Motivated by this, we consider a generalized ICL setting where the covariate distributions can vary across prompts. We show that although gradient flow succeeds at finding a global minimum in this setting, the trained transformer is still brittle under mild covariate shifts. We complement this finding with experiments on large, nonlinear transformer architectures which we show are more robust under covariate shifts.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2306.09927

Country:

Europe > Denmark (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback